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Abstract. We study a model of games that combines concurrency, im- 
perfect information and stochastic aspects. Those are finite states games 
in which, at each round, the two players choose, simultaneously and 
independently, an action. Then a successor state is chosen accordingly 
to some fixed probability distribution depending on the previous state 
and on the pair of actions chosen by the players. Imperfect information 
is modeled as follows: both players have an equivalence relation over 
states and, instead of observing the exact state, they only know to which 
equivalence class it belongs. Therefore, if two partial plays are indistin- 
guishable by some player, he should behave the same in both of them. 
We consider reachability (does the play eventually visit a final state?) 
and Biichi objective (does the play visit infinitely often a final state?). 
Our main contribution is to prove that the following problem is com- 
plete for 2-ExpTime: decide whether the first player has a strategy that 
ensures her to almost-surely win against any possible strategy of her 
oponent. We also characterise those strategies needed by the first player 
to almost-surely win. 



1 Introduction 

Perfect information turn based two-player games on a graph [10] are widely 
studied in computer science. Indeed, they are a useful tool for both theoretical 
(for instance the modern proofs of Rabin's complementation lemma rely on the 
memoryless determinacy of parity games [11]) and more practical applications. 
On the practical side, a major application of games is for the verification of 
reactive open systems. Those are systems composed of both a program and some 
(possibly hostile) environment. The verification problem consists of deciding 
whether the program can be restricted so that the system meets some given 
specification whatever the environment does. Here, restricting the system means 
synthesizing some controller, which, in term of games, is equivalent to designing 
a winning strategy for the player modeling the program [14]. 

The perfect information turn-based model, even if it suffices in many situa- 
tions, is somewhat weak for the following two reasons. First, it does not permit 
to capture the behavior of real concurrent models where, in each step, the pro- 
gram and its environment independently choose moves, whose parallel execution 
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determines the next state of the system. Second, in this model both players 
have, at each time, a perfect information on the current state of the play: this, 
for instance, forbids to model a system where the program and the environment 
share some public variables while having also their own private variables [15]. 

In this paper, we remove those two restrictions by considering concurrent 
stochastic games with imperfect information. Those are finite states games in 
which, at each round, the two players choose simultaneously and independently 
an action. Then a successor state is chosen accordingly to some fixed probability 
distribution depending on the previous state and on the pair of actions chosen 
by the players. Imperfect information is modeled as follows: both players have 
an equivalence relation over states and, instead of observing the exact state, 
they only see to which equivalence class it belongs. Therefore, if two partial 
plays are indistinguishable by some player, he should behave the same in both 
of them. Note that this model naturally captures several model studied in the 
literature [1, 9, 7, 8]. The winning conditions we consider here are reachability (is 
there a final state eventually visited?), Biichi (is there a final state that is visited 
infinitely often?) and their dual versions, safety and co-Biichi. 

We study qualitative properties of those games (note that quantitative prop- 
erties — e.g. deciding whether the value of the game is above a given threshold — 
are already undecidable in much weaker models [13]). More precisely, we inves- 
tigate the question of deciding whether some player can almost-surely win, that 
is whether he has a strategy that wins with probability 1 against any counter 
strategy of the oponent. Our main contributions is to prove that, for both reach- 
ability and Biichi objectives, one can decide, in doubly exponential time (which 
is proved to be optimal), whether the first player has an almost-surely win- 
ning strategy. Moreover, when it is the case, we are also able to construct such 
a finite-memory strategy. We also provide intermediate new results concerning 
positive winning in safety (and co-Biichi) l|-player games {a.k.a partial obser- 
vation Markov decision process). 

Related work. Concurrent games with perfect information have been deeply 
investigated in the last decade [2, 1,7]. Games with imperfect information have 
been considered for turn-based model [15] as well as for concurrent models with 
only one imperfectly informed player [9, 8]. To our knowledge, the present paper 
provides the first positive results on a model of games that combines concur- 
rency, imperfect information (on both sides) and stochastic transition function. 
In a recent independent work [4], Bcrtrand, Genest and Gimbert obtain simi- 
lar results than the one presented here for a closely related model. The main 
differences with our model are the following: Bertand et al. consider a slightly 
weaker model of games in which the players may observe their own actions, and 
they allow the players to use richer strategies where the players can randomly 
update their memory (note that those strategies when used in our model seem 
strictly more powerful than the one we consider [12]). Bertand et al. also discuss 
qualitative determinacy results and consider the case where a player is more 
informed than the other. We refer the reader to [4] for a detailed exposition. 



2 Definitions 



A probability distribution over a finite set X is a mapping d : X ^ [0,1] 

sucli tliat d{x) = 1. In the sequel we denote by 2?(X) the set of probability 
xex 

distributions over X. 

Given some set X and some equivalence relation ~ over X, [x]^ stands for 
the equivalence class of x for ~ and X/^ = {[x]^ \ x G X} denotes the set of 
equivalence classes of ~. 

For some finite alphabet A, A* {resp. A"^) designates the set of finite {resp. 
infinite) words over A. 

2.1 Arenas 

A concurrent arena with imperfect information is a tuple A = {S, Se, ^a, 

^, ~_E, where 

— S is a finite set of control states; 

— Se {resp. Sa) is the (finite) set of actions for Eve {resp. Adam); 

— d : S X Se x Sa "^{S) is the transition (total) function; 

— ~B and ~A are two equivalence relations over states. 

A play in a such an arena proceeds as follows. First it starts in some initial state 
s. Then Eve picks an action aE G Se and, simultaneously and independently, 
Adam chooses an action a a € Sa- Then a successor state is chosen accordingly 
to the probability distribution S{s, (Te, o'a)- Then the process restarts: the players 
choose a new pair of actions that induces, together with the current state, a new 
state and so on forever. Hence a play is an infinite sequence S0S1S2 ■ ■ ■ in S'^ such 
that for every i > 0, there exists {(Te, (^a) G Se x Sa with 5{si,aE, (JA){si+i) > 
0. In the sequel we refer to a prefix of a play as a partial play and we denote 
by Plays{A) the set of all plays in arena A. 

The intuitive meaning oi ^^e {resp. '^a) is that two states si and S2 such 
that si S2 {resp. si ^a S2) cannot be distinguished by Eve {resp. by Adam). 
We easily extend the relation ^e to partial plays: let A = sosi • • • s„ and A' = 
■Sq'^'i • ■ be two partial plays, then A ~b A' if and only if Sj ~b for all 
i = 0, • • • , n. 

Note that perfect information concurrent arenas (in the sense of [2, 1]) cor- 
respond to the special case where ~b and ~a are the equality relation over 
S. 

2.2 Strategies 

In order to choose their moves the players follow strategies, and, for this, they 
may use all the information they have about what was played so far. However, if 
two partial plays are equivalent for ^^e, then Eve cannot distinguish them, and 
should therefore behave the same. This leads to the following notion. 



An observation-based strategy for Eve is a function i^e '■ (S/^e)* ~* 
^{Ee), i.e., to choose her next action, Eve considers the sequence of observations 
she got so far. In particular, a strategy (ps is such that ^e{^) = </?b(A') whenever 
A ~E A'. Observation-based strategies for Adam are defined similarly. 

Of special interest are those strategies that does not require memory: a mem- 
oryless observation-based strategies for Eve is a function from S/ ~* 
'D{Se), that is to say these strategies only depend of the current equivalence 
class. 

A uniform strategy for some player X is a strategy ip such that for all par- 
tial play A, the probability measure y(A) is uniform, i.e., for all action ax G Sx, 
either ipi\){<7x) = or (^(A)(ctx) = \ {ax&Sx\vW(<yx)i^o\\ - °^ memory- 

less uniform strategies for X is a finite set containing (2l^^l — l)'"^' elements. 
Equivalontly those strategies can be seen as functions to (non-empty) sets of 
(authorised) actions. 

A finite-memory strategy for Eve with memory M (M being a finite 
set) is some triple (p = {Move, Up, mo) where TOq € M is the initial mem- 
ory. Move : M ^^{Se) associates a distribution of actions with any cle- 
ment in the memory M and Up : M x 5/^^ ^ M is a mapping updating the 
memory with respect to some observation. One defines (p{so) = Move{mo) and 
¥'(so ■ • ■ .s,i) = Move(Up{- ■ ■ Up{Up{rnn, [si]/~e), [52]/^^), • • • , [sn]/^^) ' ' ' ) for 
any n > 1. Hence, a finite-memory strategy is some observation-based strategy 
that can be implemented by a finite transducer whose set of control states is M. 



Remark 1. Note that in our definition of a strategy (and more generally in the 
definition of a play) we implicitly assume that the players only observe the 
sequence of states and not the corresponding sequence of actions. While the fact 
that Eve docs not observe what Adam played is rather fair (otherwise imperfect 
information on states would make less sense) one could object that Eve should 
observes the actions she played so far. Here, our view of a (randomised) strategy 
is the following: when Eve respects some strategy, it means that whenever she 
has to play, her strategy provides her a distribution that she sends to some 
scheduler that, together with the distribution chosen by Adam, picks the next 
state. Indeed, it permits for instance to model a system in which some agent 
does not have the resources to implement himself randomisation. 

An alternative option would be to consider that Eve flips a coin to pick her 
action and then sends this action to the scheduler that, together with the action 
chosen by Adam, picks the next state. In this case, a strategy should depend on 
the sequence of states together with the associated sequence of actions played by 
Eve. We argue that this second approach can be simulated easily by the first one, 
hence justifying our initial choice. Indeed, one can always enrich the set of states 
to encode the last pair of actions played and then use the equivalence relations 
~E / ~A to hide / show part of this information to the respective players. 



2.3 Probability Space and Outcomes of Strategies 

Let A = {S, Se, Sa, 5,^e, ~a) be a concurrent arena with imperfect informa- 
tion, let So e 5 be an initial state, be a strategy for Eve and ^pa be a strategy 
for Adam. In the sequel wc arc interested in defining the probability of a (mea- 
surable) set of plays knowing that Eve {resp. Adam) plays accordingly ipE {resp. 
(Pa)- This is done in the classical way: first one defines the probability measure 
for basic sets of plays (called here cones and corresponding to plays having some 
initial common prefix) and then extends it in a unique way to all measurable 
sets. 

First define Outcomes{sQ, (pE, li^A) to be the set of all possible plays when 
the game starts on Sq and when Eve and Adam plays respectively accord- 
ingly to (fiE and ipA- More formally, an infinite play A = sqSi ■ ■ ■ belongs to 
Outcomes{sQ, (fE, fA) if said only if, for every i > 0, there is a pair of actions 
(ctbjCta) € Sex2Ja with 6{si, ub, (TA)(si+i) > and s.t. (Pe{soSi ■ ■ • Si){aE) > 
and <^a(soSi • • • Si)(cr^) > {i.e. ax is possible accordingly to (fx, for X = 
E,A). 

Now, for any partial play A, the cone for A is the set cone (A) = A • 5"^ of all 
infinite plays with prefix A. Denote by Cones the set of all possible cones and let 
JT be the Borcl cr- field generated by Cones considered as a set of basic open sets 
{i.e. T is the smallest set containing Cones and closed under complementation, 
countable union and countable intersection). Then {Plays{A^).,T') is a a-algebra. 

A pair of strategics (<p£;, ^pa) induces a probability space over {Plays{A), !F). 
Indeed one can define a measure iJ.f^''^^ : Cones — >■ [0, 1] on cones (this task is 
easy as a cone is uniquely defined by a finite partial play) and then uniquely ex- 
tend it to a probability measure on using the Carathcodory Unique Extension 
Theorem. For this, one defines l^f^f''^'^ inductively on cones: 

- ^J■to''''^^(^) = 1 if s = So and /x^„'^''^-*(s) = otherwise. 

— For every partial play A ending in some vertex s, 

Mro"'^"(A-s') = <"'^"(A). E 'PE{X){aE).y^A{X){<TA).S{s,<TE,aA){s') 

Denote by Pr^^^ the unique extension of IJ.'^^''^^ to a probability measure 
on J^. Then {Plays{A),J^,Pr'^^''^^) is a probability space. 

2.4 Objectives, Value of a Game 

Fix a concurrent arena with imperfect information A. An objective for Eve is a 
measurable set O C Plays{A): a play is won by her if it belongs to O; otherwise it 
is won by Adam. A concurrent game with imperfect information is a triple 
(.4., so,0) where ,4 is a concurrent arena with imperfect information, so is an 
initial state and O is an objective. In the sequel wc focus on the following special 
classes of objectives (note that all of them are Borel sets hence measurable) that 
we define as means of a subset F C S of final states. 



— A reachability objective is of the form S*FS'^: a play is winning if it 
evcntiially goes through some final state. 

— A safety objective is the dual of a reachability objective, i.e. is of the form 
{S \ F)'^: a play is winning if it never goes through a final state. 

— A Biichi objective is of the form nfc>o S''S*FS'^: a play is winning if it 
goes infinitely often through final states. 

— A co-Biichi objective is the dual of a Biichi objective, i.e. is of the form 
S*{S \ F)": a play is winning if it goes finitely often through final states. 

A reachability {resp. safety, Biichi, co-Biichi) game is a game equipped with 
a reachability {resp. safety, Biichi, co-Biichi) objective. In the sequel we may 
replace O by F when it is clear from the context which winning condition we 
consider. 

Fix a concurrent game with imperfect information G = {A, so,0). A strategy 
(fE for Eve is almost-surely winning if, for any counter-strategy (pA for Adam, 
Pr'^^''^^ (O) = 1. If such a strategy exists, we say that Eve almost-surely wins 
G. A strategy (ps for Eve is positively winning if, for any counter-strategy ipA 
for Adam, Pr^^j^''^^ (O) > 0. If such a strategy exists, we say that Eve positively 
wins G. 

3 Knowledge Arena 

For the rest of this section we let .4. be a conciirrcnt arena with imperfect infor- 
mation with A = {S, Se, ^a, S, ~a) and let sq € S he some initial state. 

Remark that in our model, the players do not observe the actions they play 
but they may know the distribution they have chosen. Therefore, one could 
consider a new arena in which the states have a component indicating the domain 
of the last distribution chosen by Eve, and that this component is visible only 
to her (it is hidden to Adam by the equivalence relantion ^a)- 

Even, if she does not see the precise control state. Eve can deduce information 
about it from previous information on the control state and from the set of 
possible actions she just played. We should refer to this as the knowledge of 
Eve, which formally is a set of states. Assume Eve knows that the current state 
belongs to some set K C S. After the next move Eve observes the equivalence 
class [s]r^^ of the new control state and she also knows the subset D of actions 
she may have played (it is the domain of the distribution she chose) : hence she 
can compute the set of possible states the play can be in. This is done using the 
function UpKnow : 2'^' x [S]/^^ x 2^^^ 2^s defined by letting 

UpKnow(i4:, [s]^^,D) = 

{i ~£ s I 3r G K, (j'e G D, aA G Sa s.t. 6{r,a'E,aA){t) > 0}, 

i.e. in order to update her current knowledge, observing in which equivalence 

class is the new control state, and knowing that she played an action in D C Se, 
Eve computes the set of all states in this class that may be reached from a state 
in her former knowledge. 



Based on our initial remark and on the notion of knowledge we define the 
knowldege arena associated with A, denoted . The arena A^ is designed 
to make explicit the information Eve can collect on her moves {i.e. the domain of 
the distributions she plays) and on the possible current state. We define A^ = 
(5^, E'e, Ea, ~f , ~f ) as follows: 

- = {{s,K,D) e S X 2^ X 2^'' \ K C [s]/^e}- the first component is the 
real state, the second one is the current knowledge of Eve and the third one 
is the domain of the last distribution she played; 

- = Se X (2^^ \ 0): actions of Eve will now contain information on the 
domain of actions of the distributions she picks; 

- S^{{s, K, D), {aE, D'), aA){s', K', D") = if £>' ^ D" or K' ^ UpKnow(ii', 
[s%,,D'y, and S^ii-s, K, D), {aE, D'), aA){s' , K', D') = S{s, a^, (t^)(.s') 
otherwise: behaves as S on the first components and deterministically 
updates both the knowledge and the information on the domain; 

- {s,K,D) {s',K',D') if and only if K = K' (implying s ^e «') and 
D = D': Eve observes her knowledge and the domain of her last distribution; 

- (s, K, D) ~f (s', K', D') if and only if s ~a s' . 

The intuitive meaning of the enriched alphabet S'e of Eve is that instead of 
choosing a distribution d : Se [0, 1] Eve makes the domain Dom = {s & S \ 
d{s) > 0} of exphcit by choosing the distribution where d^{s,D) = d{s) 
\i D = Dom and d^ {s, D) ~ otherwise. We call such a distribution rf^^ w^ell- 
formed {i.e. is obtained from some distribution d as just explained) and, in 
the sequel, whenever referring to strategies of Eve in A^ , we will mean functions 
from sequences of observations into well-form,ed distributions. 

Consider an observation-based strategy (p for Eve in the arena A. Then it 
can be converted into an observation-based strategy on the associated knowledge 
arena. For this, remark that in the knowledge arena, those states reachable from 
the initial state (sq, {so}, 0) are of the form (s, K, D) with all states in K being 
equivalent with s with respect to Then one can define ip^ {{sq, Kq, Dq){s\, 
Ki,Di) ■ ■ ■ {sn,Kn, Dn)) as d^ where d = <p([so]~i5 [si]^^ ■ • • [.SkI^e) is the cor- 
responding distribution given by ip. Note that p^ is observation-based as, for 
all < /i < n, [s;i]^j5 is uniquely defined from the Kh, that are observed by Eve 
in the knowledge arena. 

Conversely, any observation-based strategy in the knowledge arena can be 
converted into an observation-based strategy in the original arena. Indeed, con- 
sider some observation-based strategy p^ in the knowledge arena: it is a mapping 
from (2'^ X 2^)* into ^^{E'e) (the equivalent classes of the relation are, by 
definition, isomorphic with 2'^ x 2^). Now, note that Eve can, while playing in 
A^ remember the domain of the distributions she played and compute on the fly 
her current knowledge (applying function UpKnow to her previous knowledge 
and to the domain of the last distribution played): hence along a play sqSi • • • Sn 
she can compute the corresponding sequence {Ki^,Do){Ki,Di) ■ ■ ■ {KmDn) of 
knowledge / domain. Now it sufSces to consider the observation-based strategy 
if for Eve in the initial arena defined by: 

v(soSi ■■■Sn) = v'^{{Ko, Do){KuDi) ■ ■ ■ {K^,, £>„)) 



Note that this last transformation (taking a strategy (p^ and producing a 
strategy f) is the inverse of the first transformation (taking a strategy Lp and 
producing a strategy ^p^). In particular, it proves that the observation-based 
strategies in both arena are in bijection. It should be clear that those strategies 
for Adam in both games are the same (as what he observes is identical). 

Assume that A is equipped with a set F of final states. Then one defines the 
final states in by letting = {{f,K,D) \ f e F} D S^: this allows to 
define an objective in A^ from an objective O in A. Based on the previous 
observations, we derive the following. 

Proposition 1. Let G = (^,50,0) be som,e imperfect information game equip- 
ped with a reachability fresp. saftey, Biichi, co-Biichi) objective. Let = 
{A^ ,{so,{sn},^),0^) be the associated game played on the knowledge arena. 
Then for any strategies LpE,(pA for Eve and Adam, the following holds: 
P^s^''^^ (O) — Pr^^^'I'^'^j 0^(0^). In particular, Eve has an alm.ost- surely winning 
observation-based strategy in G if and only if she has one in G^. 

In the setting of the previous proposition, consider the special case where 
Eve has an almost-surely winning observation-based strategy ip^ in G^ that 
only depends on the current knowledge (in particular, it is memoryless). Then 
the corresponding almost-surely winning observation-based strategy 93 in G is, in 
general, not memoryless, but can be implemented by a finite transducer whose 
set of control states is precisely the set of possible knowledges for Eve. More 
precisely the strategy consists in computing and updating on the fly (using a 
finite automaton) the value of the knowledge after the current partial play and 
to pick the next action by solely considering the knowledge. We may refer at 
such a strategy <p as a knowledge-only strategy. 

4 Decidability Results 

4.1 Reachability Objectives 

The main result of this section is the following. 

Theorem 1. For any reachability concurrent game with imperfect information, 
one can decide, in doubly exponential time, whether Eve has an almost-surely 
winning strategy. If Eve has such a strategy then she has a knowledge-only uni- 
form strategy, and such a strategy can be effectively constructed. 

Before proving Theorem 1 we first establish an intermediate result. A con- 
current game (with imperfect information) in which one player has only a single 
available action is what we refer as a 1 ^-player game with imperfect in- 
formation (those games are also known in the literature as partially observable 
Markov Decision Processes). The following result is a key ingredient for the 
proofs of Proposition 2 and Theorem 1. 



Lemma 1. Consider an 1^-player safety game with imperfect information. As- 
sume that the player lias an observation-based strategy that is positively winning. 
Then she also has an observation-based finite memory strategy that is positively 
winning. Moreover, both the strategy and the set of positively winning states can 
be computed in time £)(2l^l). 

Proof (Sketch). Consider the knowledge arena and call a knowledge K surely 
winning if the player has a knowledge based strategy that is surely winning 
from any {s,K,D) with s & K and D C Se. We prove, that if the player has 
a positively winning strategy, then the set of winning knowledges is non empty 
and that it comes with a memoryless surely winning strategy (that consists 
in staying in the surely winning component). This set also contains at least a 
singleton {s} (meaning that if the player knows that she is in s then she can 
surely win): call such states s surely winning. Then, one proves that positively 
winning states are exactly those that are connected (in the graph sense) to some 
surely winning state by a path made of non-final states. Hence a positively 
winning strategy consists in playing some initial actions randomly (trying to 
reach a surely winning state) and then in mimicking a knowledge-only surely 
winning strategy. Complexity comes with a fixpoint definition of the previous 
objects. □ 

Fix, for the rest of this section, a concurrent game with imperfect information 
G = {A,so,0) equipped with a reachability objective O defined from a set F 
of final states. We set A = {S,Se,Sa,5,^e,^a)- We also consider = 
{A^ , (sq, {sq}, 0), O^) to be the corresponding knowledge game. 

To prove Theorem 1, one first defines (in a non constructive way) a know- 
ledge-only uniform strategy (fi for Eve as follows. We let 

IC^^ = {K G 2^^ I 3ipE knowledge-based strategy for Eve s.t. (pE is almost- 
surely winning for Eve in from any (s, K, D) with s ^ K and_D C Se} 

be the set of knowledges made only by almost-surely winning states for Eve (note 
here that we require that the almost-surely winning strategy is the same for all 
configurations with the same knowledge). 

One can prove that, from a configuration with knowledge K e /C^^, Eve 
always has at least one action which ensures that she remains in K^^, and we 
define Lp as the knowledge-only uniform strategy that chooses at random one of 
these safe actions. The next proposition shows that is almost-surely winning 
for Eve. 

Proposition 2. The strategy </? is almost-surely winning for Eve from states 
whose Eve's knowledge is in 

Proof (sketch). To prove that ip is almost-surely winning, one needs to prove 
that it is almost surely-winning against any strategy of Adam. However, once 
is fixed (and as it is a knowledge-only strategy), one gets 1 ^-player game in 
which only Adam is making choices. Proving that ^ is almost surely winning is 



therefore equivalent to proving that Adam cannot positively wins in this new 
game (for a safety objective). For this we use Lemma 1 to argue that it suffices 
to prove that <^ is winning against any finite-memory strategy of Adam. This 
fact permits us to conclude. □ 

Now one can prove Theorem 1 . First Eve almost-surely wins in G if and only if 
she almost-surely wins in if and only if {sq} S /C^^, i.e. (using Proposition 
2) if and only if Eve has a knowledge-only uniform strategy in . Now, to 
decide whether Eve almost-surely wins G, it suffices to check, for any possible 
knowledge-only uniform strategy (p for her, whether it is almost-surely winning. 
Once ifi is fixed, it leads, from Adam's point of view, to a l^-player safety game 
G^ where the player positively wins if and only if ip is not almost-surely winning. 
Hence Lemma 1 implies that deciding whether f is almost-surely winning can 
be done in time exponential in the size of G^, which itself is of exponential 
size in \S\. Hence deciding whether a knowledge-only uniform strategy for Eve 
is winning can be done in doubly exponential time (in the size of The 
set of knowledge-only uniform strategies for Eve is finite and its size is doubly 
exponential in the size of the game. Hence the overall procedure, that tests 
every possible such strategics, requires doubly exponential time. As effectivity is 
immediate, this concludc!s the proof of Theorem 1. 

The naive underlying algorithm of Theorem 1 turns out to be optimal. 

Theorem 2. Deciding whether Eve almost-surely wins a concurrent game with 
imperfect information is a 2 -ExpTime- complete problem. 

Proof (sketch). The proof is a generalisation of a similar result given in [8] show- 
ing ExpTiME-hardness of concurrent games only one player is imperfectly in- 
formed. The idea is to simulate an alternating exponential space Turing machine 
(without input) . We design a game where the players describe the run of such a 
machine: transitions from existential {resp. universal) states are chosen by Eve 
{resp. Adam) and Adam is also in charge of describing the successive configu- 
rations of the machine. To prevent him from cheating, Eve can secretly mark a 
cell of the tape, and latter check whether it was correctly updated (if not she 
wins). As she cannot store the exact index of the cell (it is of exponential size), 
she could cheat in the previous phase: hence Adam secretly marks some bit and 
one recall the value of the corresponding bit of the index of the marked cell: this 
bit is checked when Eve claims that Adam cheated (if it is wrong then she is 
loosing). Eve also wins if the described run is accepting. Eve can also restart 
the computation whenever she wants (this is useful when she cannot prove that 
Adam cheated): hence if the machine accepts the only option for Adam is to 
cheat, and Eve will eventually catch him with probability one. Now if the ma- 
chine does not accept, the only option for Eve is to cheat, but it will be detected 
with positive probability. □ 

4.2 Biichi Objectives 

We now consider the problem of deciding whether Eve almost-surely wins a 
Biichi game. The results and techniques are similar to the one for reachability 



games. In particular, we need to establish the following intermediate result (the 

proof is very similar to the one of Lomma 1 except that now the winning states 
are those connected by any kind of path to a surely winning state). 

Lemma 2. Consider an 1^-player co-Bii,chi game with imperfect information. 
Assume that the player has an observation-based strategy that is positively win- 
ning. Then she also has an observation-based finite memory strategy that is pos- 
itively winning. Moreover, both the strategy and the set of positively winning 
states can be computed in time 0{2^^^). 

From Lemma 2 and extra intermediate results we derive our main result. 
Again, the key idea is to prove that the strategy that plays randomly inside the 
almost-surely winning region is an almost-surely winning strategy. 

Theorem 3. For any Biichi concurrent game with imperfect information, one 

can decide, in doubly exponential time, whether Eve has an almost-surely win- 
ning strategy. If Eve has such a strategy then she has a knowledge-based uniform 
memoryless strategy, and such a strategy can be effectively constructed. The dou- 
bly exponential time complexity bound is optimal. 

5 Discussion 

The main contribution of this paper is to prove that one can decide whether 
Eve has an almost-surely winning strategy in a concurrent game with imperfect 
information equipped with a reachability objective or a Biichi objective. 

A natural question is whether this result holds for other objectives, in par- 
ticular for co-Biichi objectives. In a recent work [3], Baier et al. established 
undecidability of the emptiness problem for probabilistic Biichi automata on 
infinite words. Such an automaton can be simulated by a l^-player imperfect 
information game: the states of the game are the one of the automaton, they are 
all equivalent for the player, and therefore an observation based strategy is an 
infinite word. Hence a pure (i.e. non-randomised) strategy in such a game coin- 
cide with an input word for the automaton. From this fact, Baier et al. derived 
that it is undecidable whether, in a l|-player co-Biichi game with imperfect 
information, Eve has an almost-surely winning pure strategy. 

One can also consider the stochastic-free version of this problem (an arena 
is deterministic iff 5{q, as, aA){q') S {0, 1} for all q, q', as, <7a) and investigate 
whether one can decide if Eve has an almost-surely winning strategy in a deter- 
ministic game equipped with a co-Biichi objective. We believe that the 1 i-player 
setting can be reduced to this new one, hence allowing to transfer undecidability 
results [12]. An even weaker model to consider is the stochastic-free model in 
which Adam has perfect information about the play [8]. 

It may happen that Eve has no almost-surely winning strategy while having 
a family (!^e)o<£<i of strategies such that ip^ ensures to win with probability at 
least 1 — £. Such a family is called limit-surely winning. Deciding existence of 
such families is a very challenging problem: indeed, in many practical situations. 



it is satisfying enough if one can control the risk of failing. Even if those questions 
have been solved for perfect information games [1] , as far as we know, there has 
not been yet any result obtained in the imperfect information setting. 

Even if the algorithms provided in this paper are " optimal" , they are rather 
naive (checking all strategies for Eve may cost a lot in practice). Hence, one 
should look for fixpoint-based algorithms as the one studied in [8]: it would be 
of great help for a symbolic implementation, and it could also be a useful step 
toward a solution of the problem of finding limit-surely winning strategies. Note 
that there are already efficient techniques and tools for finding sure winning 
strategies in subclasses of concurrent games with imperfect information [6,5]. 
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